Empirical Information Metrics for Prediction Power and Experiment Planning

نویسنده

  • Christopher J. Lee
چکیده

In principle, information theory could provide useful metrics for statistical inference. In practice this is impeded by divergent assumptions: Information theory assumes the joint distribution of variables of interest is known, whereas in statistical inference it is hidden and is the goal of inference. To integrate these approaches we note a common theme they share, namely the measurement of prediction power. We generalize this concept as an information metric, subject to several requirements: Calculation of the metric must be objective or model-free; unbiased; convergent; probabilistically bounded; and low in computational complexity. Unfortunately, widely used model selection metrics such as Maximum Likelihood, the Akaike Information Criterion and Bayesian Information Criterion do not necessarily meet all these requirements. We define four distinct empirical information metrics measured via sampling, with explicit Law of Large Numbers convergence guarantees, which meet these requirements: Ie, the empirical information, a measure of average prediction power; Ib, the overfitting bias information, which measures selection bias in the modeling procedure; Ip, the potential information, which measures the total remaining information in the observations not yet discovered by the model; and Im, the model information, which measures the model’s extrapolation prediction power. Finally, we show that Ip + Ie, Ip + Im, and Ie − Im are fixed constants for a given observed dataset (i.e. prediction target), independent of the model, and thus represent a fundamental subdivision of the total information contained in the observations. We discuss the application of these metrics to modeling and experiment planning. Information 2011, 2 18

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Basic Experiment Planning via Information Metrics: the RoboMendel Problem

In this paper we outline some mathematical questions that emerge from trying to “turn the scientific method into math”. Specifically, we consider the problem of experiment planning (choosing the best experiment to do next) in explicit probabilistic and information theoretic terms. We formulate this as an information measurement problem; that is, we seek a rigorous definition of an information m...

متن کامل

Building a Comprehensive Conceptual Framework for Power Systems Resilience Metrics

Recently, the frequency and severity of natural and man-made disasters (extreme events), which have a high-impact low-frequency (HILF) property, are increased. These disasters can lead to extensive outages, damages, and costs in electric power systems. A power system must be built with “resilience” against disasters, which means its ability to withstand disasters efficiently while ensuring the ...

متن کامل

HUMAN METRICS AFFECTING SUPPLY CHAIN PERFORMANCE: AN EMPIRICAL STUDY OF INDIAN MANUFACTURING ORGANIZATIONS

The manufacturing organizations today are having a competition of supply chain versus supply chain. Existing research work fails to relate human metrics with supply chain performance. The authors intend to empirically assess the effects of human metrics on supply chain performance in the context of Indian manufacturing organizations. A rigorous literature review has identified 12 variables. The...

متن کامل

Significance of Different Software Metrics in Defect Prediction

This paper presents an empirical analysis of significance of different process and product metrics in defect prediction models. 48 releases of 15 open-source and 38 releases of 7 proprietary projects were investigated. Pearson correlation coefficients with the number of defects were calculated for each of the metrics respectively. Subsequently defect prediction models were built using linear st...

متن کامل

An Information Theoretic Evaluation of Software Metrics for Object Lifetime Prediction

Accurate object lifetime prediction can be exploited by allocators to improve the performance of generational garbage collection by placing immortal or long-lived objects directly into immortal or old generations. Object-oriented software metrics are emerging as viable indicators for object lifetime prediction. This paper studies the correlation of various metrics with object lifetimes. However...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Information

دوره 2  شماره 

صفحات  -

تاریخ انتشار 2011